Coronavirus disease 2019 (COVID-19) time series lists confirmed cases, reported deaths, active cases and comparison with other Epidemics. Data are disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect.
This notebook uses data from various sources to understand, analyze and visualize the changes in the number of cases through different visualization techniques and plots.
This dataset includes time-series data tracking the number of people affected by COVID-19 worldwide, including:
Note: The Data collection for Recovered cases isn't quite accurate and has been stopped by a lot of countries and it is also found discrepancies in data if taken from multiple sources. Also a lot of recovery cases aren't reported and it is not possible to analyze them accurately. Yet the data reported till July 2020, is quite accurate to understand. You can select from Jan2020 to Jan 2021 to clearly see the transition in all the graphs.
Therefore the following two scenarios should also be kept in mind:
Additional References for the above point:
Most of these reports are based on the US but mostly it is true for the rest of the world.
pip install folium
Requirement already satisfied: folium in c:\users\parth\anaconda3\lib\site-packages (0.12.1.post1) Requirement already satisfied: branca>=0.3.0 in c:\users\parth\anaconda3\lib\site-packages (from folium) (0.4.2) Requirement already satisfied: jinja2>=2.9 in c:\users\parth\anaconda3\lib\site-packages (from folium) (2.11.3) Requirement already satisfied: requests in c:\users\parth\anaconda3\lib\site-packages (from folium) (2.26.0) Requirement already satisfied: numpy in c:\users\parth\anaconda3\lib\site-packages (from folium) (1.20.3) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\parth\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (1.1.1) Requirement already satisfied: certifi>=2017.4.17 in c:\users\parth\anaconda3\lib\site-packages (from requests->folium) (2021.10.8) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\parth\anaconda3\lib\site-packages (from requests->folium) (1.26.7) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\parth\anaconda3\lib\site-packages (from requests->folium) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\parth\anaconda3\lib\site-packages (from requests->folium) (3.2) Note: you may need to restart the kernel to use updated packages.
pip install plotly
Requirement already satisfied: plotly in c:\users\parth\anaconda3\lib\site-packages (5.5.0) Requirement already satisfied: six in c:\users\parth\anaconda3\lib\site-packages (from plotly) (1.16.0) Requirement already satisfied: tenacity>=6.2.0 in c:\users\parth\anaconda3\lib\site-packages (from plotly) (8.0.1) Note: you may need to restart the kernel to use updated packages.
#required packages and libraries
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import folium
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import math
import random
from datetime import timedelta
import warnings
warnings.filterwarnings('ignore')
# color pallette
cnf ='#393e46'
dth ='#ff2e63'
rec ='#21bf73'
act ='#fe9801'
import plotly as py
py.offline.init_notebook_mode(connected = True)
import os
try:
os.system("rm -rf Covid-19-Preprocessed-Dataset")
except:
print('File does not exist')
# cloning the data
!git clone https://github.com/laxmimerit/Covid-19-Preprocessed-Dataset.git
fatal: destination path 'Covid-19-Preprocessed-Dataset' already exists and is not an empty directory.
# Understanding the Data
df = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/covid_19_data_cleaned.csv', parse_dates=['Date'])
df
| Date | Province/State | Country | Lat | Long | Confirmed | Recovered | Deaths | Active | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-22 | NaN | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 |
| 1 | 2020-01-23 | NaN | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 |
| 2 | 2020-01-24 | NaN | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 |
| 3 | 2020-01-25 | NaN | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 |
| 4 | 2020-01-26 | NaN | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 212493 | 2022-01-29 | NaN | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
| 212494 | 2022-01-30 | NaN | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
| 212495 | 2022-01-31 | NaN | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
| 212496 | 2022-02-01 | NaN | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
| 212497 | 2022-02-02 | NaN | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
212498 rows × 9 columns
# We need to handle the NaN values which are there under the Province/State column
df['Province/State'] = df['Province/State'].fillna("")
df
| Date | Province/State | Country | Lat | Long | Confirmed | Recovered | Deaths | Active | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-22 | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | |
| 1 | 2020-01-23 | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | |
| 2 | 2020-01-24 | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | |
| 3 | 2020-01-25 | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | |
| 4 | 2020-01-26 | Afghanistan | 33.93911 | 67.709953 | 0 | 0 | 0 | 0 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 212493 | 2022-01-29 | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 | |
| 212494 | 2022-01-30 | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 | |
| 212495 | 2022-01-31 | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 | |
| 212496 | 2022-02-01 | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 | |
| 212497 | 2022-02-02 | Timor-Leste | -8.87420 | 125.727500 | 0 | 0 | 0 | 0 |
212498 rows × 9 columns
#importing the country_daywise, countrywise, daywise dataset
country_daywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/country_daywise.csv', parse_dates=['Date'])
countrywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/countrywise.csv')
daywise = pd.read_csv('Covid-19-Preprocessed-Dataset/preprocessed/daywise.csv', parse_dates=['Date'])
country_daywise
| Date | Country | Confirmed | Deaths | Recovered | Active | New Cases | New Recovered | New Deaths | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-01-23 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2020-01-24 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2020-01-25 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 2020-01-26 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2020-01-27 | Afghanistan | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 145035 | 2022-01-27 | Zimbabwe | 229096 | 5324 | 0 | 223772 | 153 | 0 | 3 |
| 145036 | 2022-01-28 | Zimbabwe | 229333 | 5333 | 0 | 224000 | 237 | 0 | 9 |
| 145037 | 2022-01-29 | Zimbabwe | 229415 | 5333 | 0 | 224082 | 82 | 0 | 0 |
| 145038 | 2022-01-30 | Zimbabwe | 229460 | 5337 | 0 | 224123 | 45 | 0 | 4 |
| 145039 | 2022-01-31 | Zimbabwe | 229666 | 5338 | 0 | 224328 | 206 | 0 | 1 |
145040 rows × 9 columns
#looking at the number of confirmed cases
confirmed = df.groupby('Date').sum()['Confirmed'].reset_index()
confirmed
| Date | Confirmed | |
|---|---|---|
| 0 | 2020-01-22 | 557 |
| 1 | 2020-01-23 | 655 |
| 2 | 2020-01-24 | 941 |
| 3 | 2020-01-25 | 1434 |
| 4 | 2020-01-26 | 2118 |
| ... | ... | ... |
| 738 | 2022-01-29 | 372549524 |
| 739 | 2022-01-30 | 374778211 |
| 740 | 2022-01-31 | 378391869 |
| 741 | 2022-02-01 | 381683860 |
| 742 | 2022-02-02 | 384701043 |
743 rows × 2 columns
#looking at the recovered cases
recovered = df.groupby('Date').sum()['Recovered'].reset_index()
recovered
| Date | Recovered | |
|---|---|---|
| 0 | 2020-01-22 | 30 |
| 1 | 2020-01-23 | 32 |
| 2 | 2020-01-24 | 39 |
| 3 | 2020-01-25 | 42 |
| 4 | 2020-01-26 | 56 |
| ... | ... | ... |
| 738 | 2022-01-29 | 0 |
| 739 | 2022-01-30 | 0 |
| 740 | 2022-01-31 | 0 |
| 741 | 2022-02-01 | 0 |
| 742 | 2022-02-02 | 0 |
743 rows × 2 columns
#looking at the number of death cases
deaths = df.groupby('Date').sum()['Deaths'].reset_index()
deaths
| Date | Deaths | |
|---|---|---|
| 0 | 2020-01-22 | 17 |
| 1 | 2020-01-23 | 18 |
| 2 | 2020-01-24 | 26 |
| 3 | 2020-01-25 | 42 |
| 4 | 2020-01-26 | 56 |
| ... | ... | ... |
| 738 | 2022-01-29 | 5659009 |
| 739 | 2022-01-30 | 5664564 |
| 740 | 2022-01-31 | 5674374 |
| 741 | 2022-02-01 | 5688629 |
| 742 | 2022-02-02 | 5699993 |
743 rows × 2 columns
#check for any null values
df.isnull().sum()
Date 0 Province/State 0 Country 0 Lat 0 Long 0 Confirmed 0 Recovered 0 Deaths 0 Active 0 dtype: int64
df.query('Country == "US"')
| Date | Province/State | Country | Lat | Long | Confirmed | Recovered | Deaths | Active | |
|---|---|---|---|---|---|---|---|---|---|
| 189465 | 2020-01-22 | US | 40.0 | -100.0 | 1 | 0 | 0 | 1 | |
| 189466 | 2020-01-23 | US | 40.0 | -100.0 | 1 | 0 | 0 | 1 | |
| 189467 | 2020-01-24 | US | 40.0 | -100.0 | 2 | 0 | 0 | 2 | |
| 189468 | 2020-01-25 | US | 40.0 | -100.0 | 2 | 0 | 0 | 2 | |
| 189469 | 2020-01-26 | US | 40.0 | -100.0 | 5 | 0 | 0 | 5 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 190203 | 2022-01-29 | US | 40.0 | -100.0 | 74232238 | 0 | 884405 | 73347833 | |
| 190204 | 2022-01-30 | US | 40.0 | -100.0 | 74424305 | 0 | 884726 | 73539579 | |
| 190205 | 2022-01-31 | US | 40.0 | -100.0 | 74951445 | 0 | 887148 | 74064297 | |
| 190206 | 2022-02-01 | US | 40.0 | -100.0 | 75350359 | 0 | 890770 | 74459589 | |
| 190207 | 2022-02-02 | US | 40.0 | -100.0 | 75680487 | 0 | 894316 | 74786171 |
743 rows × 9 columns
fig = go.Figure()
fig.add_trace(go.Scatter(x = confirmed['Date'], y= confirmed['Confirmed'], mode='lines+markers', name='Confirmed Cases', line=dict(color='Orange',width =2)))
fig.add_trace(go.Scatter(x = recovered['Date'], y= recovered['Recovered'], mode='lines+markers', name='Recovered Cases', line=dict(color='Green', width =2)))
fig.add_trace(go.Scatter(x = deaths['Date'], y= deaths['Deaths'], mode='lines+markers', name='Deaths', line=dict(color='Red', width=2)))
fig.update_layout(title = "WorldWide Covid-19 Cases", xaxis_tickfont_size=14, yaxis= dict(title='Number of Cases'))
fig.show()
#convert date to string
df['Date'] = df['Date'].astype(str)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 212498 entries, 0 to 212497 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 212498 non-null object 1 Province/State 212498 non-null object 2 Country 212498 non-null object 3 Lat 212498 non-null float64 4 Long 212498 non-null float64 5 Confirmed 212498 non-null int64 6 Recovered 212498 non-null int64 7 Deaths 212498 non-null int64 8 Active 212498 non-null int64 dtypes: float64(2), int64(4), object(3) memory usage: 14.6+ MB
#using plotly express to display a density map
fig =px.density_mapbox(df, lat='Lat', lon='Long', hover_name='Country', hover_data=['Confirmed', 'Recovered', 'Deaths'], animation_frame='Date',color_continuous_scale='Portland', radius=7, zoom=0,height=700)
fig.update_layout(title='WorldWide Covid-19 Cases with TimeLapse')
fig.update_layout(mapbox_style='open-street-map', mapbox_center_lon=0)
fig.show()
df['Date'] =pd.to_datetime(df['Date'])
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 212498 entries, 0 to 212497 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 212498 non-null datetime64[ns] 1 Province/State 212498 non-null object 2 Country 212498 non-null object 3 Lat 212498 non-null float64 4 Long 212498 non-null float64 5 Confirmed 212498 non-null int64 6 Recovered 212498 non-null int64 7 Deaths 212498 non-null int64 8 Active 212498 non-null int64 dtypes: datetime64[ns](1), float64(2), int64(4), object(2) memory usage: 14.6+ MB
temp = df.groupby('Date')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
temp = temp[temp['Date']==max(temp['Date'])].reset_index(drop =True)
tm =temp.melt(id_vars = 'Date', value_vars = ['Active', 'Deaths', 'Recovered'])
fig = px.treemap(tm , path = ['variable'], values='value', height = 250, width = 800, color_discrete_sequence=[act, dth, rec])
fig.data[0].textinfo ='label+text+value'
fig.show()
temp = df.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()
temp = temp.melt(id_vars = 'Date', value_vars =['Recovered', 'Deaths', 'Active'], var_name = 'Case', value_name = 'Count')
fig = px.area(temp, x='Date', y='Count', color='Case', height=500, title='Cases Over Time', color_discrete_sequence=[rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()
#Worldwide cases on Folium Maps
temp =df[df['Date'] == max(df['Date'])]
m =folium.Map(location =[0,0], tiles='cartodbpositron', min_zoom =1, max_zoom= 4, zoom_start=1)
for i in range(0, len(temp)):
folium.Circle(location = [temp.iloc[i]['Lat'], temp.iloc[i]['Long']], color='crimson', fill = 'crimson',
tooltip = '<li><bold> Country: ' + str(temp.iloc[i]['Country'])+
'<li><bold> Province: ' + str(temp.iloc[i]['Province/State'])+
'<li><bold> Confirmed: ' + str(temp.iloc[i]['Confirmed'])+
'<li><bold> Deaths: ' + str(temp.iloc[i]['Deaths']),
radius = int(temp.iloc[i]['Confirmed'])**0.5).add_to(m)
m
fig = px.choropleth(country_daywise, locations='Country', locationmode='country names', color=country_daywise['Confirmed'],
hover_name = 'Country', animation_frame=country_daywise['Date'].dt.strftime('%Y-%m-%d'),
title = 'Cases Over Time in Different Countries', color_continuous_scale=px.colors.sequential.Inferno)
fig.update(layout_coloraxis_showscale=True)
fig.show()
fig_c =px.bar(daywise, x='Date', y='Confirmed', color_discrete_sequence=[act])
fig_d =px.bar(daywise, x='Date', y='Deaths', color_discrete_sequence=[dth])
fig = make_subplots(rows =1, cols =2, shared_xaxes=False, horizontal_spacing= 0.1,
subplot_titles=('Confirmed Cases', 'Death Cases'))
fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)
fig.update_layout(height=400)
fig.show()
fig_c = px.choropleth(countrywise, locations='Country', locationmode='country names',
color=np.log(countrywise['Confirmed']), hover_name='Country', hover_data=['Confirmed'])
temp = countrywise[countrywise['Deaths']> 0]
fig_d = px.choropleth(temp, locations='Country', locationmode='country names',
color=np.log(temp['Deaths']), hover_name='Country', hover_data=['Deaths'])
fig = make_subplots(rows=1, cols=2, subplot_titles=['Confirmed Cases', 'Deaths'],
specs=[[{'type':'choropleth'},{'type':'choropleth'}]])
fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)
fig.update(layout_coloraxis_showscale=False)
fig.show()
Note: The Data is only accurate till July 2021, as after that time the data for recovery cases were not reported or were not being tracked by the institutes. And it is difficult to predict the number based on it.
daywise.columns
Index(['Date', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'New Cases',
'Deaths / 100 Cases', 'Recovered / 100 Cases', 'Deaths / 100 Recovered',
'No. of Countries'],
dtype='object')
fig1 = px.line(daywise, x='Date', y='Deaths / 100 Cases', color_discrete_sequence=[dth])
fig2 = px.line(daywise, x='Date', y='Recovered / 100 Cases', color_discrete_sequence=[rec])
fig3 = px.line(daywise, x='Date', y='Deaths / 100 Recovered', color_discrete_sequence=[rec])
fig = make_subplots(rows =1, cols=3, shared_xaxes=False,
subplot_titles=('Deaths / 100 Cases','Recovered / 100 Cases','Deaths / 100 Recovered'))
fig.add_trace(fig1['data'][0], row=1,col=1)
fig.add_trace(fig2['data'][0], row=1,col=2)
fig.add_trace(fig3['data'][0], row=1,col=3)
fig.update_layout(height=400)
fig.show()
fig_c = px.bar(daywise, x='Date', y='Confirmed', color_discrete_sequence=[act])
fig_d = px.bar(daywise, x='Date', y='No. of Countries', color_discrete_sequence=[dth])
fig =make_subplots(rows =1, cols=2, shared_xaxes=False, horizontal_spacing=0.1, subplot_titles=('No. of New Cases per Day', 'No. of Countries'))
fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)
fig.show()
countrywise.columns
Index(['Country', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'New Cases',
'Deaths / 100 Cases', 'Recovered / 100 Cases', 'Deaths / 100 Recovered',
'Population', 'Cases / Million People', 'Confirmed last week',
'1 week change', '1 week % increase'],
dtype='object')
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
top =15
#fig for confirmed cases
fig_c = px.bar(countrywise.sort_values('Confirmed').tail(top), x='Confirmed', y='Country',
text = 'Confirmed', orientation='h', color_discrete_sequence=[cnf])
#fig for death cases
fig_d = px.bar(countrywise.sort_values('Deaths').tail(top), x='Deaths', y='Country',
text = 'Deaths', orientation='h', color_discrete_sequence=[dth])
#fig for active cases
fig_a = px.bar(countrywise.sort_values('Active').tail(top), x='Active', y='Country',
text = 'Active', orientation='h', color_discrete_sequence=[act])
#for recovered cases but due to discrepancy and no data available commented this part
# fig_r = px.bar(countrywise.sort_values('Recovered').tail(top), x='Recovered', y='Country',
# text = 'Recovered', orientation='h', color_discrete_sequence=[rec])
#note we haven't plotted the fig's for any scenario where recovered cases is being used to calculate the result to avoid irregularity
#fig for deaths / 100 cases
fig_dc = px.bar(countrywise.sort_values('Deaths / 100 Cases').tail(top), x='Deaths / 100 Cases', y='Country',
text = 'Deaths / 100 Cases', orientation='h', color_discrete_sequence=['#ff0000'])
#fig for new cases country wise
fig_nc = px.bar(countrywise.sort_values('New Cases').tail(top), x='New Cases', y='Country',
text = 'New Cases', orientation='h', color_discrete_sequence=['#944dff'])
temp= countrywise[countrywise['Population']>1000000]
#fig for cases per million people
fig_p = px.bar(temp.sort_values('Cases / Million People').tail(top), x='Cases / Million People', y='Country',
text = 'Cases / Million People', orientation='h', color_discrete_sequence=['#3366ff'])
#fig for 1 week changes
fig_ow = px.bar(countrywise.sort_values('1 week change').tail(top), x='1 week change', y='Country',
text = '1 week change', orientation='h', color_discrete_sequence=['#ff6600'])
#fig for 1 week % increase
tem= countrywise[countrywise['Confirmed']>100]
fig_op = px.bar(tem.sort_values('1 week % increase').tail(top), x='1 week % increase', y='Country',
text = '1 week % increase', orientation='h', color_discrete_sequence=['#990033'])
fig= make_subplots(rows=4 , cols=2, shared_xaxes=False, horizontal_spacing= 0.2, vertical_spacing=.05,
subplot_titles=('Confirmed Cases', 'Deaths Reported', 'Active Cases', 'Deaths / 100 Cases'
,'New Cases', 'Cases / Million People', '1 week change', '1 week % increase'))
fig.add_trace(fig_c['data'][0], row=1, col=1)
fig.add_trace(fig_d['data'][0], row=1, col=2)
fig.add_trace(fig_a['data'][0], row=2, col=1)
fig.add_trace(fig_dc['data'][0], row=2, col=2)
fig.add_trace(fig_nc['data'][0], row=3, col=1)
fig.add_trace(fig_p['data'][0], row=3, col=2)
fig.add_trace(fig_ow['data'][0], row=4, col=1)
fig.add_trace(fig_op['data'][0], row=4, col=2)
fig.update_layout(height=3000)
fig.show()
countrywise.sort_values('Deaths', ascending=False).head(15)
| Country | Confirmed | Deaths | Recovered | Active | New Cases | Deaths / 100 Cases | Recovered / 100 Cases | Deaths / 100 Recovered | Population | Cases / Million People | Confirmed last week | 1 week change | 1 week % increase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 182 | US | 74951445 | 887148 | 0 | 74064297 | 527140 | 1.18 | 0.0 | 0.0 | 330541757 | 226753.0 | 71783483 | 3167962 | 4.41 |
| 23 | Brazil | 25463530 | 627589 | 0 | 24835941 | 102883 | 2.46 | 0.0 | 0.0 | 422706534 | 60239.0 | 24142032 | 1321498 | 5.47 |
| 79 | India | 41469499 | 496242 | 0 | 40973257 | 167059 | 1.20 | 0.0 | 0.0 | 2751341248 | 15072.0 | 39799202 | 1670297 | 4.20 |
| 144 | Russia | 11670366 | 324672 | 0 | 11345694 | 123033 | 2.78 | 0.0 | 0.0 | 292579178 | 39888.0 | 10988027 | 682339 | 6.21 |
| 115 | Mexico | 4942590 | 306091 | 0 | 4636499 | 12521 | 6.19 | 0.0 | 0.0 | 255584572 | 19338.0 | 4685767 | 256823 | 5.48 |
| 138 | Peru | 3224406 | 205505 | 0 | 3018901 | 0 | 6.37 | 0.0 | 0.0 | 65597846 | 49154.0 | 2976260 | 248146 | 8.34 |
| 186 | United Kingdom | 17431225 | 156281 | 0 | 17274944 | 848962 | 0.90 | 0.0 | 0.0 | 134862019 | 129252.0 | 16063010 | 1368215 | 8.52 |
| 85 | Italy | 10983116 | 146498 | 0 | 10836618 | 57631 | 1.33 | 0.0 | 0.0 | 120822834 | 90903.0 | 10001344 | 981772 | 9.82 |
| 80 | Indonesia | 4353370 | 144320 | 0 | 4209050 | 10185 | 3.32 | 0.0 | 0.0 | 273523621 | 15916.0 | 4289305 | 64065 | 1.49 |
| 37 | Colombia | 5887261 | 134300 | 0 | 5752961 | 15284 | 2.28 | 0.0 | 0.0 | 99141018 | 59383.0 | 5761398 | 125863 | 2.18 |
| 81 | Iran | 6373174 | 132454 | 0 | 6240720 | 28995 | 2.08 | 0.0 | 0.0 | 83992953 | 75877.0 | 6258181 | 114993 | 1.84 |
| 62 | France | 19266496 | 131937 | 0 | 19134559 | 88457 | 0.68 | 0.0 | 0.0 | 68128061 | 282798.0 | 16917220 | 2349276 | 13.89 |
| 6 | Argentina | 8378656 | 121273 | 0 | 8257383 | 43472 | 1.45 | 0.0 | 0.0 | 45195777 | 185386.0 | 7940657 | 437999 | 5.52 |
| 66 | Germany | 10025463 | 117979 | 0 | 9907484 | 179431 | 1.18 | 0.0 | 0.0 | 166310062 | 60282.0 | 8909503 | 1115960 | 12.53 |
| 184 | Ukraine | 4255206 | 106880 | 0 | 4148326 | 23063 | 2.51 | 0.0 | 0.0 | 89634808 | 47473.0 | 4055643 | 199563 | 4.92 |
top =15
fig = px.scatter(countrywise.sort_values('Deaths', ascending=False).head(top),
x = 'Confirmed', y='Deaths', color='Country', height=600, size='Confirmed',
text='Country', log_x = True, log_y = True, title='Deaths vs Confirmed Cases(Cases are on log10 scale)')
fig.update_traces(textposition= 'top center')
fig.update_layout(showlegend = True)
fig.update_layout(xaxis_rangeslider_visible = True)
fig.show()
fig = px.bar(country_daywise, x='Date', y='Confirmed', color='Country', height=600,
title='Confirmed Cases', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
fig = px.bar(country_daywise, x='Date', y='Deaths', color='Country', height=600,
title='Deaths', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
fig = px.bar(country_daywise, x='Date', y='New Cases', color='Country', height=600,
title='New Cases', color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
fig = px.line(country_daywise, x ='Date', y='Confirmed', color='Country', height=600, title='Confirmed Cases',
color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
fig = px.line(country_daywise, x ='Date', y='Deaths', color='Country', height=600, title='Deaths',
color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
fig = px.line(country_daywise, x ='Date', y='New Cases', color='Country', height=600, title='New Cases',
color_discrete_sequence=px.colors.cyclical.mygbm)
fig.show()
df['Date'] =pd.to_datetime(df['Date'])
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 212498 entries, 0 to 212497 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 212498 non-null datetime64[ns] 1 Province/State 212498 non-null object 2 Country 212498 non-null object 3 Lat 212498 non-null float64 4 Long 212498 non-null float64 5 Confirmed 212498 non-null int64 6 Recovered 212498 non-null int64 7 Deaths 212498 non-null int64 8 Active 212498 non-null int64 dtypes: datetime64[ns](1), float64(2), int64(4), object(2) memory usage: 14.6+ MB
gt_100 = country_daywise[country_daywise['Confirmed']>100]['Country'].unique()
temp = df[df['Country'].isin(gt_100)]
temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>100]
min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']
from_100th_case = pd.merge(temp, min_date, on ='Country')
from_100th_case['N days'] = (from_100th_case['Date'] - from_100th_case['Min Date']).dt.days
fig = px.line(from_100th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 100 Cases', height=600)
fig.show()
gt_1000 = country_daywise[country_daywise['Confirmed']>1000]['Country'].unique()
temp = df[df['Country'].isin(gt_1000)]
temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>1000]
min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']
from_1000th_case = pd.merge(temp, min_date, on ='Country')
from_1000th_case['N days'] = (from_1000th_case['Date'] - from_1000th_case['Min Date']).dt.days
fig = px.line(from_1000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 1000 Cases', height=600)
fig.show()
gt_100000 = country_daywise[country_daywise['Confirmed']>100000]['Country'].unique()
temp = df[df['Country'].isin(gt_100000)]
temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>100000]
min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']
from_100000th_case = pd.merge(temp, min_date,on ='Country')
from_100000th_case['N days'] = (from_100000th_case['Date'] - from_100000th_case['Min Date']).dt.days
fig = px.line(from_100000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 100K Cases', height=600)
fig.show()
gt_1000000 = country_daywise[country_daywise['Confirmed']>1000000]['Country'].unique()
temp = df[df['Country'].isin(gt_1000000)]
temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>1000000]
min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']
from_1000000th_case = pd.merge(temp, min_date,on ='Country')
from_1000000th_case['N days'] = (from_1000000th_case['Date'] - from_1000000th_case['Min Date']).dt.days
fig = px.line(from_1000000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 1Million Cases', height=600)
fig.show()
gt_10000000 = country_daywise[country_daywise['Confirmed']>10000000]['Country'].unique()
temp = df[df['Country'].isin(gt_10000000)]
temp = temp.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp =temp[temp['Confirmed']>10000000]
min_date = temp.groupby('Country')['Date'].min().reset_index()
min_date.columns = ['Country', 'Min Date']
from_10000000th_case = pd.merge(temp, min_date,on ='Country')
from_10000000th_case['N days'] = (from_10000000th_case['Date'] - from_10000000th_case['Min Date']).dt.days
fig = px.line(from_10000000th_case, x = 'N days', y='Confirmed', color='Country', title='N Days From 10Million Cases', height=600)
fig.show()
# Confirmed Cases
full_latest = df[df['Date'] == max(df['Date'])]
fig = px.treemap(full_latest.sort_values(by='Confirmed', ascending=False).reset_index(drop=True),
path = ['Country', 'Province/State'], values='Confirmed', height=700,
title='Number of Confirmed Cases',
color_discrete_sequence=px.colors.qualitative.Dark2)
fig.data[0].textinfo= 'label+text+value'
fig.show()
#deaths
full_latest = df[df['Date'] == max(df['Date'])]
fig = px.treemap(full_latest.sort_values(by='Deaths', ascending=False).reset_index(drop=True),
path = ['Country', 'Province/State'], values='Deaths', height=700,
title='Number of Deaths',
color_discrete_sequence=px.colors.qualitative.Dark2)
fig.data[0].textinfo= 'label+text+value'
fig.show()
first_date= df[df['Confirmed']>0]
first_date = first_date.groupby('Country')['Date'].agg(['min']).reset_index()
last_date= df.groupby(['Country', 'Date'])['Confirmed', 'Deaths']
last_date = last_date.sum().diff().reset_index()
mask = (last_date['Country'] != last_date['Country'].shift(1))
last_date.loc[mask, 'Confirmed'] = np.nan
last_date.loc[mask, 'Deaths'] = np.nan
last_date = last_date[last_date['Confirmed']>0]
last_date = last_date.groupby('Country')['Date'].agg(['max']).reset_index()
first_last = pd.concat([first_date, last_date['max']], axis=1)
first_last['max'] = first_last['max'] + timedelta(days=1)
first_last['Days'] = first_last['max'] - first_last['min']
first_last['Task'] = first_last['Country']
first_last.columns = ['Country', 'Start', 'Finish', 'Days', 'Task']
first_last = first_last.sort_values('Days')
colors = ['#' + ''.join([random.choice('0123456789ABCDEF') for j in range(6)]) for i in range(len(first_last))]
fig = ff.create_gantt(first_last, index_col ='Country', colors=colors, show_colorbar=False,
bar_width =0.2, showgrid_x = True, showgrid_y = True, height= 2500)
fig.show()
temp = country_daywise.groupby(['Country', 'Date'])['Confirmed'].sum().reset_index()
temp = temp[temp['Country'].isin(gt_10000000)]
countries= temp['Country'].unique()
ncols = 3
nrows = math.ceil(len(countries)/ncols)
fig = make_subplots(rows = nrows, cols= ncols, shared_xaxes= False, subplot_titles= countries)
for ind, country in enumerate(countries):
row = int((ind/ncols)+1)
col = int((ind%ncols)+1)
fig.add_trace(go.Bar(x= temp['Date'], y=temp.loc[temp['Country']== country, 'Confirmed'], name=country), row= row, col=col)
fig.update_layout(height=4000, title_text='Confirmed Cases in each Country')
fig.update_layout(showlegend=False)
fig.show()
#source for the number wikipedia
epidemics = pd.DataFrame({
'epidemic' :['COVID-19', 'SARS', 'EBOLA', 'MERS', 'H1N1'],
'start_year' : [2019, 2002, 2013, 2012, 2009],
'end_year' :[2020, 2004, 2016, 2020, 2010],
'confirmed' :[full_latest['Confirmed'].sum(), 8422, 28646, 2519, 6724149],
'deaths': [full_latest['Deaths'].sum(), 813, 11323, 866, 19654]
})
#calculating mortality rate
epidemics['mortality'] =round((epidemics['deaths']/epidemics['confirmed'])*100, 2)
temp = epidemics.melt(id_vars='epidemic', value_vars=['confirmed', 'deaths', 'mortality'],
var_name='Case', value_name='Value')
fig= px.bar(temp, x='epidemic', y='Value', color= 'epidemic', text='Value', facet_col='Case',
color_discrete_sequence=px.colors.qualitative.Bold)
fig.update_traces(textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode= 'hide')
fig.update_yaxes(showticklabels= False)
fig.layout.yaxis2.update(matches = None)
fig.layout.yaxis3.update(matches = None)
fig.show()